TF (Term Frequency): This represents how frequently a term occurs in a document. If a term occurs more frequently, then its TF value increases. For example, if the term "apple" appears 10 times in a document of 100 words, the TF for "apple" is 10/100 = 0.1.
IDF (Inverse Document Frequency): This represents how important a term is in the entire corpus of documents. If a term is common across many documents, its IDF value decreases. For example, if we have 1000 documents and the term "apple" appears in 100 of them, the IDF for "apple" is log(1000/100) = 1.
TF-IDF: This is the product of TF and IDF. A high TF-IDF score can indicate a term's importance in a single document relative to a set of documents or corpus. In our example, the TF-IDF for "apple" is 0.1 * 1 = 0.1.
Interpretation: A low TF value for a term means that the term appears less frequently in the document. A low IDF value means that the term is common across many documents. A low TF-IDF value could mean either that the term is common across all documents (low IDF), appears less frequently in the document in question (low TF), or both. Using our "apple" example, if "apple" appeared only 5 times in the document (TF = 5/100 = 0.05) or if it appeared in 200 out of 1000 documents (IDF = log(1000/200) = 0.7), the TF-IDF value would be lower (0.05 * 0.7 = 0.035).
| Title | Price | List Ranking | Sponsored | Item Number |
|---|
| Term | TF | IDF | TF-IDF |
|---|
| Term | TF-IDF |
|---|